Goto

Collaborating Authors

 mining technique


Linking Heterogeneous Data with Coordinated Agent Flows for Social Media Analysis

Chen, Shifu, Deng, Dazhen, Xu, Zhihong, Xu, Sijia, Peng, Tai-Quan, Wu, Yingcai

arXiv.org Artificial Intelligence

Social media platforms generate massive volumes of heterogeneous data, capturing user behaviors, textual content, temporal dynamics, and network structures. Analyzing such data is crucial for understanding phenomena such as opinion dynamics, community formation, and information diffusion. However, discovering insights from this complex landscape is exploratory, conceptually challenging, and requires expertise in social media mining and visualization. Existing automated approaches, though increasingly leveraging large language models (LLMs), remain largely confined to structured tabular data and cannot adequately address the heterogeneity of social media analysis. We present SIA (Social Insight Agents), an LLM agent system that links heterogeneous multi-modal data -- including raw inputs (e.g., text, network, and behavioral data), intermediate outputs, mined analytical results, and visualization artifacts -- through coordinated agent flows. Guided by a bottom-up taxonomy that connects insight types with suitable mining and visualization techniques, SIA enables agents to plan and execute coherent analysis strategies. To ensure multi-modal integration, it incorporates a data coordinator that unifies tabular, textual, and network data into a consistent flow. Its interactive interface provides a transparent workflow where users can trace, validate, and refine the agent's reasoning, supporting both adaptability and trustworthiness. Through expert-centered case studies and quantitative evaluation, we show that SIA effectively discovers diverse and meaningful insights from social media while supporting human-agent collaboration in complex analytical tasks.


Exploring Traffic Crash Narratives in Jordan Using Text Mining Analytics

Jaradat, Shadi, Alhadidi, Taqwa I., Ashqar, Huthaifa I., Hossain, Ahmed, Elhenawy, Mohammed

arXiv.org Artificial Intelligence

This study explores traffic crash narratives in an attempt to inform and enhance effective traffic safety policies using text-mining analytics. Text mining techniques are employed to unravel key themes and trends within the narratives, aiming to provide a deeper understanding of the factors contributing to traffic crashes. This study collected crash data from five major freeways in Jordan that cover narratives of 7,587 records from 2018-2022. An unsupervised learning method was adopted to learn the pattern from crash data. Various text mining techniques, such as topic modeling, keyword extraction, and Word Co-Occurrence Network, were also used to reveal the co-occurrence of crash patterns. Results show that text mining analytics is a promising method and underscore the multifactorial nature of traffic crashes, including intertwining human decisions and vehicular conditions. The recurrent themes across all analyses highlight the need for a balanced approach to road safety, merging both proactive and reactive measures. Emphasis on driver education and awareness around animal-related incidents is paramount.


Text mining in education

Ferreira-Mello, R., Andre, M., Pinheiro, A., Costa, E., Romero, C.

arXiv.org Artificial Intelligence

The explosive growth of online education environments is generating a massive volume of data, specially in text format from forums, chats, social networks, assessments, essays, among others. It produces exciting challenges on how to mine text data in order to find useful knowledge for educational stakeholders. Despite the increasing number of educational applications of text mining published recently, we have not found any paper surveying them. In this line, this work presents a systematic overview of the current status of the Educational Text Mining field. Our final goal is to answer three main research questions: Which are the text mining techniques most used in educational environments? Which are the most used educational resources? And which are the main applications or educational goals? Finally, we outline the conclusions and the more interesting future trends.


Graph contrastive learning. Getting high quality labeled dataset at…

#artificialintelligence

Getting high quality labeled dataset at scale for graph-related problems is often expensive. Graph neural networks tend to overfit small training data sets and fail to learn reusable, task-invariant knowledge. Self-supervised learning has been hugely successful in multiple ML areas and has improved label efficiency. These techniques obtain supervisory signals from the unlabelled data by utilising the data's underlying structure. The goal of graph contrastive learning is to learn a low-dimensional representation to encode the graph's structural and attribute information.


Petroleum prices prediction using data mining techniques -- A Review

Weldon, Kiplang'at, Ngechu, John, Everlyne, Ngatho, Njambi, Nancy, Gikunda, Kinyua

arXiv.org Artificial Intelligence

Over the past 20 years, Kenya's demand for petroleum products has proliferated. This is mainly because this particular commodity is used in many sectors of the country's economy. Exchange rates are impacted by constantly shifting prices, which also impact Kenya's industrial output of commodities. The cost of other items produced and even the expansion of the economy is significantly impacted by any change in the price of petroleum products. Therefore, accurate petroleum price forecasting is critical for devising policies that are suitable to curb fuel-related shocks. Data mining techniques are the tools used to find valuable patterns in data. Data mining techniques used in petroleum price prediction, including artificial neural networks (ANNs), support vector machines (SVMs), and intelligent optimization techniques like the genetic algorithm (GA), have grown increasingly popular. This study provides a comprehensive review of the existing data mining techniques for making predictions on petroleum prices. The data mining techniques are classified into regression models, deep neural network models, fuzzy sets and logic, and hybrid models. A detailed discussion of how these models are developed and the accuracy of the models is provided.


Robust self-healing prediction model for high dimensional data

Rayasam, Anirudha, Patil, Nagamma

arXiv.org Artificial Intelligence

Owing to the advantages of increased accuracy and the potential to detect unseen patterns, provided by data mining techniques they have been widely incorporated for standard classification problems. They have often been used for high precision disease prediction in the medical field, and several hybrid prediction models capable of achieving high accuracies have been proposed. Though this stands true most of the previous models fail to efficiently address the recurring issue of bad data quality which plagues most high dimensional data, and especially proves troublesome in the highly sensitive medical data. This work proposes a robust self healing (RSH) hybrid prediction model which functions by using the data in its entirety by removing errors and inconsistencies from it rather than discarding any data. Initial processing involves data preparation followed by cleansing or scrubbing through context-dependent attribute correction, which ensures that there is no significant loss of relevant information before the feature selection and prediction phases. An ensemble of heterogeneous classifiers, subjected to local boosting, is utilized to build the prediction model and genetic algorithm based wrapper feature selection technique wrapped on the respective classifiers is employed to select the corresponding optimal set of features, which warrant higher accuracy. The proposed method is compared with some of the existing high performing models and the results are analyzed.


Principal Data Scientist

#artificialintelligence

Crossix is a health-focused technology company dedicated to advancing healthcare marketing with analytics and innovative planning, targeting, measurement, and optimization solutions. Positioned at the center of big data, innovative technology, and multichannel media, Crossix, a Veeva Company, provides our clients with insights to help make strategic business decisions and drive improved patient outcomes. Crossix knows that our employees are integral to our success, which is why we have created an inclusive culture where everyone can thrive. Along with competitive salaries and benefits, we invest in opportunities for career growth, and provide other perks, such as team outings, fitness allowances and professional development. Crossix is headquartered in New York with growing offices in Minsk, Belarus and Kiryat Ono, Israel.


Data Mining with Big Data in Intrusion Detection Systems: A Systematic Literature Review

Salo, Fadi, Injadat, MohammadNoor, Nassif, Ali Bou, Essex, Aleksander

arXiv.org Artificial Intelligence

Cloud computing has become a powerful and indispensable technology for complex, high performance and scalable computation. The exponential expansion in the deployment of cloud technology has produced a massive amount of data from a variety of applications, resources and platforms. In turn, the rapid rate and volume of data creation has begun to pose significant challenges for data management and security. The design and deployment of intrusion detection systems (IDS) in the big data setting has, therefore, become a topic of importance. In this paper, we conduct a systematic literature review (SLR) of data mining techniques (DMT) used in IDS-based solutions through the period 2013-2018. We employed criterion-based, purposive sampling identifying 32 articles, which constitute the primary source of the present survey. After a careful investigation of these articles, we identified 17 separate DMTs deployed in an IDS context. This paper also presents the merits and disadvantages of the various works of current research that implemented DMTs and distributed streaming frameworks (DSF) to detect and/or prevent malicious attacks in a big data environment.


Nowcasting lightning occurrence from commonly available meteorological parameters using machine learning techniques

#artificialintelligence

Lightning discharges in the atmosphere owe their existence to the combination of complex dynamic and microphysical processes. Knowledge discovery and data mining methods can be used for seeking characteristics of data and their teleconnections in complex data clusters. We have used machine learning techniques to successfully hindcast nearby and distant lightning hazards by looking at single-site observations of meteorological parameters. We developed a four-parameter model based on four commonly available surface weather variables (air pressure at station level (QFE), air temperature, relative humidity, and wind speed). The produced warnings are validated using the data from lightning location systems.


6 ways AI and automation could improve process mining

#artificialintelligence

Digital innovation requires enterprises to learn how to understand, manage and change increasingly complicated processes. A new generation of process mining tools promises to make it easier to automatically interpret the digital exhaust of modern enterprises to help improve decision-making, drive innovation, and offer new products and services. "By understanding how processes really operate, companies can create operational fluidity to drive more efficient and productive operations that create better customer experiences," said Alexander Rinke, CEO and co-founder of Celonis, a process mining platform based in Germany. "Instead of simply identifying areas of friction, AI will further evolve process mining by allowing businesses to implement recommended changes with employees, enhancing productivity while also saving resources." The core idea of process mining lies in finding new ways to create and calibrate models of how things work with event logs.